Determining whether to extend a drain time to copy data blocks from a first storage to a second storage

ABSTRACT

Provided are a computer program product, system, and method for determining whether to extend a drain time to copy data blocks from a first storage to a second storage. A data structure is generated indicating data blocks in the first storage to copy to the second storage. A drain operation is initiated to copy the data blocks indicated in the first storage to the second storage for a drain time period. Write requests to the data blocks indicated in the data structure are queued during the drain time period, wherein the queued write requests are not completed while queued. Metric information based on the writes that occur to data blocks in the first storage are gathered during the drain time period; and in response to expiration of the drain time period, a determination is made from the gathered metric information of whether to continue the drain operation or terminate the drain operation.

BACKGROUND OF THE INVENTION

1. Field of the Invention

The present invention relates to a computer program product, system, andmethod for determining whether to extend a drain time to copy datablocks from a first storage to a second storage.

2. Description of the Related Art

Disaster recovery systems typically address two types of failures, asudden catastrophic failure at a single point-in-time or data loss overa period of time. In the second type of gradual disaster, updates tovolumes may be lost. To assist in recovery of data updates, a copy ofdata may be provided at a remote location. Such dual or shadow copiesare typically made as the application system is writing new data to aprimary storage device. Different copy technologies may be used formaintaining remote copies of data at a secondary site, such asInternational Business Machine Corporation's (“IBM”) Extended RemoteCopy (XRC), Coupled XRC (CXRC), Global Copy, and Global Mirror Copy.

In data mirroring systems, data is maintained in volume pairs. A volumepair is comprised of a volume in a primary storage device and acorresponding volume in a secondary storage device that includes anidentical copy of the data maintained in the primary volume. Primary andsecondary servers may be used to control access to the primary andsecondary storage devices. In certain data mirroring systems, a timer isused to provide a uniform time across systems so that updates written bydifferent applications to different primary storage devices useconsistent time-of-day (TOD) value as a time stamp. The host operatingsystem or the application may time stamp updates to a data set or set ofdata sets when writing such data sets to volumes in the primary storage.The integrity of data updates is related to insuring that updates aredone at the secondary volumes in the volume pair in the same order asthey were done on the primary volume. The time stamp provided by theapplication program determines the logical sequence of data updates.

In many application programs, such as database systems, certain writescannot occur unless a previous write occurred; otherwise the dataintegrity would be jeopardized. Such a data write whose integrity isdependent on the occurrence of a previous data write is known as adependent write. Volumes in the primary and secondary storages areconsistent when all writes have been transferred in their logical order,i.e., all dependent writes transferred first before the writes dependentthereon. A consistency group has a consistency time for all data writesin a consistency group having a time stamp equal or earlier than theconsistency time stamp. A consistency group is a collection of updatesto the primary volumes such that dependent writes are secured in aconsistent manner. The consistency time is the latest time to which thesystem guarantees that updates to the secondary volumes are consistent.Consistency groups maintain data consistency across volumes and storagedevices. Thus, when data is recovered from the secondary volumes, therecovered data will be consistent.

One technique to provide a consistent point-in-time copy of data is tosuspend all writes to the primary storage and then while writes aresuspended copy all the data to mirror to the secondary storage or backupdevice. A disadvantage of this technique is that host writes aresuspended for the time to create a point-in-time copy of data, which mayadversely effect application processing at the host. An alternativetechnique is to establish a logical copy of data at the primary storage,which takes a very short period of time, such as no more than a secondor two. Thus, suspending host writes to the primary storage during thetime to establish the logical copy is far less disruptive to hostapplication processing than would occur if host writes were suspendedfor the time to copy all the source data to the target volume. Afterestablishing the logical copy, source volume data subject to an updateis copied to a target volume so that the target volume has the data asof the point-in-time the logical copy was established at the primarystorage, before the update. This defers the physical copying until anupdate is received. This logical copy operation is performed to minimizethe time during which the target and source volumes are inaccessible.

To drain or copy a consistency group of data from a primary storage to asecondary storage, the primary system maintains indication of the blocksof data in the consistency group to drain. During the drain operation,host writes to the data being drained, also known as collisions, aredelayed until the drain of that block completes. Thus, the drainoperation may have a negative impact on host performance because thetime to complete write requests is delayed. A drain time period may beset during which the data in the consistency group is copied to thesecondary storage and host writes to the data not copied over, i.e.,collisions, are queued and delayed. The drain operation is failed if thedraining of all the data blocks in the consistency group is notcompleted within the drain time period. In such case, the operations todrain the consistency group must be performed again and subject to thesame risk that the drain operation may not complete within the timeprovided for the drain operations.

There is a need in the art for improved techniques for draining datafrom a first storage to a second storage.

SUMMARY

Provided are a computer program product, system, and method fordetermining whether to extend a drain time to copy data blocks from afirst storage to a second storage. A data structure is generatedindicating data blocks in the first storage to copy to the secondstorage. A drain operation is initiated to copy the data blocksindicated in the first storage to the second storage for a drain timeperiod. Write requests to the data blocks indicated in the datastructure are queued during the drain time period, wherein the queuedwrite requests are not completed while queued. Metric information basedon the writes that occur to data blocks in the first storage aregathered during the drain time period; and in response to expiration ofthe drain time period, a determination is made from the gathered metricinformation of whether to continue the drain operation or terminate thedrain operation.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 illustrates an embodiment of a network computing environment.

FIG. 2 illustrates an embodiment of gathered metric information.

FIGS. 3A and 3B illustrate an embodiment of operations to drain datafrom a first storage to a second storage.

FIG. 4 illustrates an embodiment of operations to process a writerequest to the first storage.

FIG. 5 illustrates an embodiment of operations to process anacknowledgment that a data block was successfully copied from the firststorage to the second storage.

FIG. 6 illustrates an embodiment of a computer architecture.

DETAILED DESCRIPTION

FIG. 1 illustrates an embodiment of a network computing environment. Afirst server 2 manages Input/Output (I/O) requests from one or more hostsystems 4 to a primary storage 6 in which storage volumes 8, referred toas source volumes 8, are configured. The first server 2 includes astorage manager 10 program that manages I/O requests to the primarystorage volumes 8 and may mirror updates to source volumes 8 to targetvolumes 12 in a secondary storage 16. A second server 18 includes astorage manager 20 program to manage I/O access to the second storage 16and management communication with the first server 2.

The first 2 and second 18 servers include a cache 22 and 24,respectively, to buffer read and write data to their correspondingstorage 6 and 16. Both the first 2 and second 18 servers may receiveread and write requests from host systems 4.

The primary storage manager 10 may form a consistency group of updatesin the source volume 8 to copy to the target volume 12 that areconsistent as of a point-in-time. To copy the consistency group ofupdates to the secondary storage 8, the storage manager 10 maintains acopy data structure 26 indicating data blocks in the consistency groupin the source volume 8 (first storage 6) to copy to the target volume 10(second storage 16). The storage manager 10 initiates a drain operationto copy or drain updates in the consistency group indicated in the copydata structure 26 to the target volume 10. During the drain operation,the storage manager 10 queues host updates to data blocks indicated inthe copy data structure 26 as not copied to the target volume 10 in acollision queue 30. The storage manager 10 gathers metric information 32on write performance while draining data in the copy data structure 26to the target volume 10.

In one embodiment, the copy data structure 26 may comprise a bitmaphaving an entry for each block in the consistency group or other groupto copy, where one bit value indicates the block has not been copied tothe second storage 16 and the other bit value indicates the block hasbeen copied. The drain operation is performed during a drain timeperiod, during which host 4 writes to the data blocks being drained arequeued in the collision queue 30. The storage manager 10 usesperformance metric thresholds 32 to determine whether to extend thedrain time of the drain operation during which host 4 updates to databeing drained are queued. The determination of whether to extend thedrain time period may be based on balancing the affects on hostperformance that occur from collisions, i.e., delaying processing ofwrite requests to data that has not yet been drained, with the goal ofcompleting the drain operation. The data being drained may be part of aconsistency group or may comprise another grouping of data not part of aconsistency group.

FIG. 2 illustrates an embodiment of the gathered metric information 28during the drain operation, which may include a number of received writerequests 50 (to data being drained and data not subject to draining),completed write requests 52, collisions 54 (i.e., host 4 updatesdirected to data blocks indicated in the copy data structure 26 as notcopied to the target volume 12), and an average time to complete writerequests 56. The metric information 50, 52, 54, and 56 may be updatedduring the drain operations. In certain embodiments, the metricinformation 28 may include some or all of the metric information 50, 52,54, and 56 and additional metric information gathered during the drainoperation.

In one embodiment, the performance metric threshold 32 may indicate acollision ratio threshold of collisions to write requests, such that ifthe actual collision ratio as indicated in the gathered metricinformation 28 is below the threshold, then the drain operation maycontinue. In this case, the drain operation is not significantlyimpacting host I/Os because host update collisions are low.Alternatively, the performance metric threshold 32 may indicate anaverage write time. In such case, if the time to complete writes, asindicated in the gathered metric information 28, is below the averagewrite time threshold, then the drain operation may continue because thedrain operation is not resulting in write completion time exceeding athreshold.

In a yet further embodiments, the performance metric threshold 32indicates a collision threshold and a blocks-to-drain threshold, suchthat if the actual collisions are below the threshold and the remainingblocks-to-drain are low, then the drain operation may continue. In thiscase, host impact is not significant as indicated by the relatively lowcollisions and the drain operation is close to completion, as indicatedby the remaining blocks-to-drain.

In certain embodiments, the performance metric thresholds 32 may includee different thresholds to apply at the end of different drain timeperiods to dynamically alter the determinations made to continue drainoperations as the duration of the drain operation is extended. Forinstance, the thresholds applied later on during the draining process todetermine whether to continue draining may be more stringent, such asrequiring a lower collision ratio, then the thresholds used early in thedrain process. The actual performance metric thresholds may bedetermined by empirical studies in a test environment.

The first 2 and second 18 servers and host 4 may communicate over anetwork 34. The network 34 may comprise a Storage Area Network (SAN),Local Area Network (LAN), Intranet, the Internet, Wide Area Network(WAN), peer-to-peer network, wireless network, arbitrated loop network,etc. The servers 2 and 18 may comprise an enterprise storage server,storage controller, blade server, general purpose server, desktopcomputer, workstation, telephony device, personal digital assistant(PDA), etc., or other device used to manage I/O requests to attachedstorage systems 6 a, 6 b, 6 c. The storages 6 and 16 may each comprisestorage media implemented in one or more storage devices known in theart, such as interconnected hard disk drives (e.g., configured as aDASD, RAID, JBOD, etc.), magnetic tape, solid state storage devices(e.g., EEPROM (Electrically Erasable Programmable Read-Only Memory),flash memory, flash disk, storage-class memory (SCM)), electronicmemory, etc. The servers 2 and 18 and storages 6 and 16 may beimplemented in a distributed storage environment or network storageenvironment, such as “cloud” storage. The volumes configured in thestorage systems may comprise a logical arrangement of tracks or blocksof data.

The storage managers 10 and 20 may be implemented as one or moresoftware programs loaded into a memory and executed by processors in thesevers 6 and 18. In an alternative embodiment, the storage managers 10and 20 may be implemented with hardware logic, such as an ApplicationSpecific Integrated Circuit (ASIC), or as a programmable processorexecuting code in a computer readable storage medium.

FIGS. 3A and 3B illustrate operations performed by the storage manager10 to drain the data blocks indicated in the copy data structure 26 tothe second storage 16. Upon initiating (at block 100) the drainoperations, the storage manager 10 generates (at block 102) a copy datastructure 26 indicating data blocks in the first storage 6 to copy tothe second storage 16, which may comprise updated data in the sourcevolume 8 in a consistency group to copy a consistency group of dataconsistent as of a point-in-time to the second storage 16, or maycomprise data not in a consistency group. The storage manager 10 starts(at block 104) drain operations to copy the data blocks indicated in thecopy data structure 26 in the first storage to the second storage for afirst drain time period. If (at block 106) the first drain time periodhas not expired, then the storage manager 10 continues (at block 108)the drain operation to copy data blocks indicated in the copy datastructure 26 to the second storage 16. When the first drain time periodexpires (from the yes branch of block 106), then the storage manager 10determines (at block 110) a first metric value based on the gatheredmetric information 18, such as a first collision ration, first averagewrite completion time, first remaining blocks-to-drain and first numberof collisions, etc. The storage manager 10 compares (at block 112) thedetermined first metric value to a first performance metric threshold inthe thresholds 32, such as by comparing the determined collision ratioto a first collision ratio threshold, comparing a determined averagewrite request completion time to a first threshold, comparing thedetermined remaining blocks-to-drain and number of collisions tothresholds, etc.

If (at block 114) the comparison indicates to not continue theoperation, such as if the determined collision ratio is above acollision threshold, i.e., too many collisions, then the storage manager10 terminates (at block 116) the drain operation and fails the copyoperation to second storage 16. In this way, the copying of theconsistency group fails and has to be retried because operatingconditions, such as collisions, average write request completion time,etc., indicate that the burdens on host performance outweigh theadvantages of completing the drain operation, so the drain operation isterminated. If (from the yes branch at block 114) the comparisonindicates to continue the drain operation, then the storage manager 10continues (at block 118) draining data indicated in the copy datastructure 26 and gathering metric information 28 for a second drain timeperiod. During the second drain time period, the drain operationscontinue (from the no branch of block 120). When the second drain timeperiod expires (from the yes branch of block 120), the storage manager10 determines (at block 122) a second metric value based on the gatheredmetric information 26, such as a second collision ratio, a secondaverage write completion time, second remaining blocks-to-drain andsecond number of collisions, etc. The determined second metric value iscompared (at block 124) to a second performance metric threshold. In oneembodiment, the second performance metric threshold may be morestringent or different than the first performance metric threshold, suchas a lower collision ratio threshold, lower average write completiontime threshold, lower remaining blocks-to-drain threshold, lower numberof collisions thresholds, such that the measured metric value has toindicate even lower burdens on the hosts in order to continue the drainoperation for an even longer period of time.

With respect to FIG. 3B, if (at block 126) the comparison indicates tonot continue the operation, such as if the determined second collisionratio is above a second collision threshold, i.e., too many collisions,then the storage manager 10 terminates (at block 128) the drainoperation and fails the copy operation to second storage 16. If (fromthe yes branch at block 126) the comparison indicates to continue thedrain operation, then the storage manager 10 continues (at block 130)draining data indicated in the copy data structure 26 and gatheringmetric information 28 for a third drain time period. During the thirddrain time period, the drain operations continue (from the no branch ofblock 132). When the third drain time period expires (from the yesbranch of block 132), the storage manager 10 determines (at block 134) athird metric value based on the gathered metric information 26, such asa third collision ratio, a third average write completion time, thirdremaining blocks-to-drain, third number of collisions, etc. Thedetermined third metric value is compared (at block 136) to a second (orthird or other) performance metric threshold. If (at block 138) thecomparison indicates to continue the drain operation, i.e., thedetermined metric falls below the threshold, then control proceeds backto block 130 to continue the drain operation. Otherwise, if (from the nobranch of block 138) the comparison indicates that the drain operationis not continue, then control proceeds back to block 128 to terminatethe drain operation.

In one embodiment, once the third drain time period expires, the drainoperation will terminate if not completed. In an alternative embodiment,the drain operation may continue to perform iterations based on thecomparison at block 138 after each drain time period expires untilcompletion of the drain operation if the measured performance metricsfall below the thresholds, indicating that the burdens on the hosts dueto the drain operations are not sufficiently significant to outweigh thegoal of completing the drain operation. In one embodiment, the secondperformance metric threshold may be used continually for all checksafter the second check. In alternative embodiments, the performancemetric threshold used for subsequent checks may change each time. In analternative embodiment, the first and second performance metricthresholds may comprise the same value, such that the threshold remainsconstant through all checks on whether to continue with the drainoperation. The time periods may comprise several seconds.

In an embodiment where the gathered metric information comprises anumber of collisions and number of write requests and the performancemetric thresholds comprise collision ratio thresholds, i.e., the ratioof collisions to write requests, then the drain operation continues ifthe measured collision ratio is less than the threshold used during thechecks at blocks 114, 126, and 138, and fails if the measured collisionratio is above the one or more thresholds used, indicating that there isan unacceptably high ratio of collisions adversely affecting hostperformance. In certain embodiments, the second collision ratiothreshold using during a check after the first check, such as at blocks126 and 138 in FIG. 3B, is lower than the first collision rationthreshold used, thus requiring that collisions and effect on host 4performance be even less to further continue the drain operation, suchas in FIG. 3B.

In further embodiments, the gathered metric information comprises anaverage write completion time that is calculated as writes to datablocks, indicated in the copy data structure 26 and not indicated in thecopy data structure 26. The drain operations continue if the measuredaverage write completion time is less than the threshold used during thechecks at blocks 114, 126, and 138 and fails if the measured writecompletion time is above the one or more thresholds used, indicatingthat write completion time is unacceptably high and negatively impactinghost 4 performance. In certain embodiments, the second write completiontime threshold using during a check after the first check, such as atblocks 126 and 138 in FIG. 3B, is lower than the first write completiontime threshold used, thus requiring that write time completion andeffect on host 4 performance be even less to further continue the drainoperation, such as in FIG. 3B.

In a further embodiment, the gathered metric information may comprise anumber of collisions and the remaining blocks-to-drain, as indicated inthe copy data structure 26. In such case, the drain operation continuesif the measured number of collisions is less than a collision numberthreshold and the remaining blocks-to-drain is below a blocks-to-drainthreshold used during the checks at blocks 114, 126, and 138. The drainoperation fails if the measured number of collisions and blocks-to-drainmetrics are above the thresholds, indicating that the negative impact onhost performance, as indicated by the number of collisions exceeding thethreshold, is unacceptably high, and that the benefit of continuing thedrain operation is limited because the drain operation is not close tocompletion, i.e., the remaining blocks-to-drain exceeds the threshold.In certain embodiments, the second collision number and blocks-to-drainthresholds used during a check after the first check, such as at blocks126 and 138 in FIG. 3B, is lower than the first thresholds, thusrequiring that the effect on host performance, as indicated by thenumber of collisions, and closeness to completing the drain, asindicated by the remaining blocks-to-drain, is even less than before tofurther continue the drain operation, such as in FIG. 3B.

FIG. 4 illustrates an embodiment of operations performed by the storagemanager 10 to process a received write request to a data block in thefirst storage 6. Upon receiving (at block 200) the write request, thestorage manager 10 records (at block 202) metric information 28 for thereceived write request, such as indicating that a write request wasreceived and the start time of receiving the write request. If (at block204) the write request is directed to a data block indicated in the copydata structure 26 as copied to the second storage 16, then the storagemanager 10 writes (at block 206) the received data to the data block inthe first storage 26 and records (at block 208) metrics for thecompleted write request, such as the time to complete the write request.If (at block 204) the write request is to a data block indicated in thecopy data structure 26 as not copied, i.e., not drained, to the secondstorage 16, then there is a collision and the storage manager 10 queues(at block 210) the received write request in the collision queue 30 andrecords (at block 212) metrics indicating the collision.

FIG. 5 illustrates an embodiment of operations performed by the storagemanager 10 to process an acknowledgement from the second server 18 thatthe data block was successfully copied to the second storage 16. Uponreceiving (at block 250) the acknowledgment that the copy operation ofthe data block to the second storage 16 completed, if (at block 251) thecompleted write was not being drained, i.e., not indicated in the copydata structure 26, then control ends. Otherwise, if (at block 251) thecompleted write is to a data block indicated in the copy data structure26, then the storage manager 10 updates (at block 252) the copy datastructure 26 to indicate that the data block was copied to the secondstorage 16. If (at block 254) the collision queue 30 includes an updatefor the data block acknowledged as copied to the second storage 16, thenthat queued write operation can be completed. In such case, the storagemanager 10 removes (at block 256) the queued write request from thecollision queue 30 and writes (at block 258) the received data of thedequeued write request to the data block in the first storage 6. Metricsfor the completed write, such as the time to complete, are recorded (atblock 260). If the completed write is not for a block subject to a writerequest in the collision queue 30 (from the no branch of block 254) orafter the queued write is applied (from block 260), the storage manger10 determines (at block 262) whether the copy data structure 26indicates that all blocks have been copied to the second storage 16. Ifso, then all drain operations occurring in FIGS. 3A and 3B areterminated (at block 264) and control ends. If (at block 262) the drainoperation is still ongoing, i.e., all blocks not yet copied to storage,then control ends.

In the described embodiments, the drain time may be adjusted duringdrain operations to take into account a collision ratio of collisions towrite requests, the completion time of write requests, the number ofwrite requests, and the remaining blocks-to-drain. In this way, if thecurrent drain operations are not unduly affecting host performance, asindicated by comparing a measured performance metric to a performancemetric threshold indicating a host performance impact, then the drainoperation may continue. Otherwise, if the determined impact on hostperformance is unacceptable, then the drain operation may be terminated.Further, the described embodiments may factor in the amount of timeremaining in the drain operation, such as if the number ofblocks-to-drain is low, then the benefit of completing drainingsuccessfully may outweigh the impact of collisions on host performancefor the relatively small amount of time needed to complete therelatively few number of blocks remaining to drain.

Additional Embodiment Details

The described operations may be implemented as a method, apparatus orcomputer program product using standard programming and/or engineeringtechniques to produce software, firmware, hardware, or any combinationthereof. Accordingly, aspects of the embodiments may take the form of anentirely hardware embodiment, an entirely software embodiment (includingfirmware, resident software, micro-code, etc.) or an embodimentcombining software and hardware aspects that may all generally bereferred to herein as a “circuit,” “module” or “system.” Furthermore,aspects of the embodiments may take the form of a computer programproduct embodied in one or more computer readable medium(s) havingcomputer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may beutilized. The computer readable medium may be a computer readable signalmedium or a computer readable storage medium. A computer readablestorage medium may be, for example, but not limited to, an electronic,magnetic, optical, electromagnetic, infrared, or semiconductor system,apparatus, or device, or any suitable combination of the foregoing. Morespecific examples (a non-exhaustive list) of the computer readablestorage medium would include the following: an electrical connectionhaving one or more wires, a portable computer diskette, a hard disk, arandom access memory (RAM), a read-only memory (ROM), an erasableprogrammable read-only memory (EPROM or Flash memory), an optical fiber,a portable compact disc read-only memory (CD-ROM), an optical storagedevice, a magnetic storage device, or any suitable combination of theforegoing. In the context of this document, a computer readable storagemedium may be any tangible medium that can contain, or store a programfor use by or in connection with an instruction execution system,apparatus, or device.

A computer readable signal medium may include a propagated data signalwith computer readable program code embodied therein, for example, inbaseband or as part of a carrier wave. Such a propagated signal may takeany of a variety of forms, including, but not limited to,electro-magnetic, optical, or any suitable combination thereof. Acomputer readable signal medium may be any computer readable medium thatis not a computer readable storage medium and that can communicate,propagate, or transport a program for use by or in connection with aninstruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmittedusing any appropriate medium, including but not limited to wireless,wireline, optical fiber cable, RF, etc., or any suitable combination ofthe foregoing.

Computer program code for carrying out operations for aspects of thepresent invention may be written in any combination of one or moreprogramming languages, including an object oriented programming languagesuch as Java, Smalltalk, C++ or the like and conventional proceduralprogramming languages, such as the “C” programming language or similarprogramming languages. The program code may execute entirely on theuser's computer, partly on the user's computer, as a stand-alonesoftware package, partly on the user's computer and partly on a remotecomputer or entirely on the remote computer or server. In the latterscenario, the remote computer may be connected to the user's computerthrough any type of network, including a local area network (LAN) or awide area network (WAN), or the connection may be made to an externalcomputer (for example, through the Internet using an Internet ServiceProvider).

Aspects of the present invention are described below with reference toflowchart illustrations and/or block diagrams of methods, apparatus(systems) and computer program products according to embodiments of theinvention. It will be understood that each block of the flowchartillustrations and/or block diagrams, and combinations of blocks in theflowchart illustrations and/or block diagrams, can be implemented bycomputer program instructions. These computer program instructions maybe provided to a processor of a general purpose computer, specialpurpose computer, or other programmable data processing apparatus toproduce a machine, such that the instructions, which execute via theprocessor of the computer or other programmable data processingapparatus, create means for implementing the functions/acts specified inthe flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computerreadable medium that can direct a computer, other programmable dataprocessing apparatus, or other devices to function in a particularmanner, such that the instructions stored in the computer readablemedium produce an article of manufacture including instructions whichimplement the function/act specified in the flowchart and/or blockdiagram block or blocks.

The computer program instructions may also be loaded onto a computer,other programmable data processing apparatus, or other devices to causea series of operational steps to be performed on the computer, otherprogrammable apparatus or other devices to produce a computerimplemented process such that the instructions which execute on thecomputer or other programmable apparatus provide processes forimplementing the functions/acts specified in the flowchart and/or blockdiagram block or blocks.

In certain embodiments, the system of FIG. 1 may be implemented as acloud component part in a cloud computing environment. In the cloudcomputing environment, the systems architecture of the hardware andsoftware components involved in the delivery of cloud computing maycomprise a plurality of cloud components communicating with each otherover a network, such as the Internet. For example, in certainembodiments, the first server of FIG. 1 may provide hosts and clientsdata mirroring services in a networked cloud.

FIG. 6 illustrates an embodiment of a computer architecture 300 that maybe implemented at the servers 2 and 18 and hosts 4 in FIG. 1. Thearchitecture 300 may include a processor 302 (e.g., a microprocessor), amemory 304 (e.g., a volatile memory device), and storage 306 (e.g., anon-volatile storage, such as magnetic disk drives, optical disk drives,a tape drive, etc.). The storage 306 may comprise an internal storagedevice or an attached or network accessible storage. Programs, includingan operating system 308 and the storage manager 10, 20, in the storage306 are loaded into the memory 304 and executed by the processor 302.The memory 304 may further include the cache 20, 22, collision queue 30,performance metric thresholds 32, gathered metric information 28, andcopy data structure 26. The architecture further includes a network card310 to enable communication with the network 30. An input device 312 isused to provide user input to the processor 302, and may include akeyboard, mouse, pen-stylus, microphone, touch sensitive display screen,or any other activation or input mechanism known in the art. An outputdevice 314 is capable of rendering information transmitted from theprocessor 302, or other component, such as a display monitor, printer,storage, etc.

The terms “an embodiment”, “embodiment”, “embodiments”, “theembodiment”, “the embodiments”, “one or more embodiments”, “someembodiments”, and “one embodiment” mean “one or more (but not all)embodiments of the present invention(s)” unless expressly specifiedotherwise.

The terms “including”, “comprising”, “having” and variations thereofmean “including but not limited to”, unless expressly specifiedotherwise.

The enumerated listing of items does not imply that any or all of theitems are mutually exclusive, unless expressly specified otherwise.

The terms “a”, “an” and “the” mean “one or more”, unless expresslyspecified otherwise.

Devices that are in communication with each other need not be incontinuous communication with each other, unless expressly specifiedotherwise. In addition, devices that are in communication with eachother may communicate directly or indirectly through one or moreintermediaries.

A description of an embodiment with several components in communicationwith each other does not imply that all such components are required. Onthe contrary a variety of optional components are described toillustrate the wide variety of possible embodiments of the presentinvention.

Further, although process steps, method steps, algorithms or the likemay be described in a sequential order, such processes, methods andalgorithms may be configured to work in alternate orders. In otherwords, any sequence or order of steps that may be described does notnecessarily indicate a requirement that the steps be performed in thatorder. The steps of processes described herein may be performed in anyorder practical. Further, some steps may be performed simultaneously.

When a single device or article is described herein, it will be readilyapparent that more than one device/article (whether or not theycooperate) may be used in place of a single device/article. Similarly,where more than one device or article is described herein (whether ornot they cooperate), it will be readily apparent that a singledevice/article may be used in place of the more than one device orarticle or a different number of devices/articles may be used instead ofthe shown number of devices or programs. The functionality and/or thefeatures of a device may be alternatively embodied by one or more otherdevices which are not explicitly described as having suchfunctionality/features. Thus, other embodiments of the present inventionneed not include the device itself.

The illustrated operations of FIGS. 3A, 3B, 4, and 5 show certain eventsoccurring in a certain order. In alternative embodiments, certainoperations may be performed in a different order, modified or removed.Moreover, steps may be added to the above described logic and stillconform to the described embodiments. Further, operations describedherein may occur sequentially or certain operations may be processed inparallel. Yet further, operations may be performed by a singleprocessing unit or by distributed processing units.

The foregoing description of various embodiments of the invention hasbeen presented for the purposes of illustration and description. It isnot intended to be exhaustive or to limit the invention to the preciseform disclosed. Many modifications and variations are possible in lightof the above teaching. It is intended that the scope of the invention belimited not by this detailed description, but rather by the claimsappended hereto. The above specification, examples and data provide acomplete description of the manufacture and use of the composition ofthe invention. Since many embodiments of the invention can be madewithout departing from the spirit and scope of the invention, theinvention resides in the claims hereinafter appended.

1. A computer program product for copying data from a first storage to asecond storage, the computer program product comprising a computerreadable storage medium having computer readable program code embodiedtherein that executes to perform operations, the operations comprising:generating a data structure indicating data blocks in the first storageto copy to the second storage; initiating a drain operation to copy thedata blocks indicated in the first storage to the second storage for adrain time period; queuing write requests to the data blocks indicatedin the data structure during the drain time period, wherein the queuedwrite requests are not completed while queued; gathering metricinformation based on the writes that occur to data blocks in the firststorage during the drain time period; and in response to expiration ofthe drain time period, determining from the gathered metric informationwhether to continue the drain operation or terminate the drainoperation.
 2. The computer program product of claim 1, wherein theoperations further comprise: updating the data structure to indicatethat a data block indicated in the data structure was copied to thesecond storage in response to copying the data block from the firststorage to the second storage; determining whether a received writerequest is directed to a data block indicated in the data structure asnot copied to the second storage, wherein the write request to the datablock indicated in the data structure as not copied to the secondstorage is queued; and performing the write request to the data block inresponse to determining that the data structure does not indicate thedata block as not copied to the second storage.
 3. The computer programproduct of claim 1, wherein the determining from the gathered metricinformation of whether to continue the drain operation comprises:providing a performance metric threshold; determining a metric valuebased on the gathered metric information; and comparing the determinedmetric value to the performance metric threshold to determine whether tocontinue the drain operation.
 4. The computer program product of claim3, wherein the performance metric threshold comprises a firstperformance metric threshold, wherein the drain time period comprises afirst drain time period, wherein the determined metric value comprises afirst metric value, wherein the drain operation is continued for asecond drain time period in response to the continuing of the drainoperation, and wherein the operations further comprise: continuing togather metric information during the second drain time period; inresponse to the second drain time period expiring, determining a secondmetric value based on the gathered metric information; and comparing thedetermined second metric value to a second performance metric todetermine whether to continue the drain operation.
 5. The computerprogram product of claim 4, until the drain operation has completedresulting in all the data blocks indicated in the data structure copiedto the second storage, performing iterations of the followingoperations: continuing the drain operation for an additional drain timeperiod in response to determining to continue the drain in response tocomparing the second metric value or subsequently calculated additionalmetric value with the second performance metric; continuing to gathermetric information during the additional drain time period; in responseto the additional drain time period expiring before all the data blocksindicated in the data structure have been copied to the second storage,determining an additional metric value based on the gathered metricinformation; determining whether the determined additional metric valueis less than the second performance metric threshold; and returning tothe continuing of the drain operation for the additional time period. 6.The computer program product of claim 3, wherein the gathered metricinformation indicates received write requests and collisions, whereincollisions comprise write requests directed to data blocks indicated inthe data structure as not copied to the second storage, wherein theperformance metric threshold indicates a collision ratio threshold,wherein the determined metric value comprises a determined collisionratio of a number of collisions and number of received write requestsindicated in the gathered metric information, and wherein the drainoperation is continued in response to determining that the determinedcollision ratio is less than the collision ratio threshold.
 7. Thecomputer program product of claim 6, wherein the collision ratiothreshold comprises a first collision ratio threshold, wherein the draintime period comprises a first drain time period, wherein the determinedcollision ratio comprises a first collision ratio, wherein the drainoperation is continued for a second drain time period in response to thecontinuing of the drain operation, wherein the operations furthercomprise: continuing to gather metric information indicating writerequests received and write requests directed to data blocks indicatedin the data structure as not copied to the second storage during thesecond drain time period; in response to the second drain time periodexpiring, determining a second collision ratio based on the gatheredmetric information; determining whether the determined second collisionratio is less than a second collision ratio threshold; and continuingthe drain operation in response to determining that the determinedsecond collision ratio is less than the second collision ratio.
 8. Thecomputer program product of claim 7, wherein the second collision ratiothreshold is less than the first collision ratio threshold.
 9. Thecomputer program product of claim 3, wherein the gathered metricinformation indicates a time to complete received write requests,wherein the performance metric threshold indicates a write timethreshold, wherein the determined metric value comprises a determinedaverage write time based on the gathered metric information, and whereinthe operations further comprise: continuing the drain operation inresponse to determining that the determined average write time is lessthan the collision ratio; and terminating the drain operation inresponse to determining that the determined average write time isgreater than the collision ratio.
 10. The computer program product ofclaim 3, wherein the gathered metric information indicates a number ofblocks indicated in the data structure not yet copied to the secondstorage and a number of collisions, wherein collisions comprise writerequests directed to data blocks indicated in the data structure as notcopied to the second storage, wherein the performance metric thresholdindicates a collision number threshold and a blocks-to-drain threshold,wherein the determined metric value comprises a determinedblocks-to-drain and determined number of collisions, and wherein theoperations further comprise: continuing the drain operation in responseto determining that the determined blocks-to-drain is less than theblocks-to-drain threshold and the determined number of collisions isless than the collision number threshold.
 11. The computer programproduct of claim 3, wherein the data blocks indicated in the datastructure are part of a consistency group of updated data in the firststorage consistent as of a point of time, wherein the operations furthercomprise: in response to the comparing of the determined metric value tothe performance metric threshold indicating to not continue the drainoperation by performing: terminating the drain operation; and failingthe copy operation of the data blocks indicated in the data structurefrom the first storage to the second storage.
 12. A system incommunication with a first storage and a second storage, comprising: aprocessor; a computer readable storage medium having code executed bythe processor to perform operations, the operations comprising:generating a data structure indicating data blocks in the first storageto copy to the second storage; initiating a drain operation to copy thedata blocks indicated in the first storage to the second storage for adrain time period; queuing write requests to the data blocks indicatedin the data structure during the drain time period, wherein the queuedwrite requests are not completed while queued; gathering metricinformation based on the writes that occur to data blocks in the firststorage during the drain time period; and in response to expiration ofthe drain time period, determining from the gathered metric informationwhether to continue the drain operation or terminate the drainoperation.
 13. The system of claim 12, wherein the determining from thegathered metric information of whether to continue the drain operationcomprises: providing a performance metric threshold; determining ametric value based on the gathered metric information; and comparing thedetermined metric value to the performance metric threshold to determinewhether to continue the drain operation.
 14. The system of claim 13,wherein the performance metric threshold comprises a first performancemetric threshold, wherein the drain time period comprises a first draintime period, wherein the determined metric value comprises a firstmetric value, wherein the drain operation is continued for a seconddrain time period in response to the continuing of the drain operation,and wherein the operations further comprise: continuing to gather metricinformation during the second drain time period; in response to thesecond drain time period expiring, determining a second metric valuebased on the gathered metric information; and comparing the determinedsecond metric value to a second performance metric to determine whetherto continue the drain operation.
 15. The system of claim 13, wherein thegathered metric information indicates received write requests andcollisions, wherein collisions comprise write requests directed to datablocks indicated in the data structure as not copied to the secondstorage, wherein the performance metric threshold indicates a collisionratio threshold, wherein the determined metric value comprises adetermined collision ratio of a number of collisions and number ofreceived write requests indicated in the gathered metric information,and wherein the drain operation is continued in response to determiningthat the determined collision ratio is less than the collision ratiothreshold.
 16. The system of claim 13, wherein the gathered metricinformation indicates a time to complete received write requests,wherein the performance metric threshold indicates a write timethreshold, wherein the determined metric value comprises a determinedaverage write time based on the gathered metric information, and whereinthe operations further comprise: continuing the drain operation inresponse to determining that the determined average write time is lessthan the collision ratio; and terminating the drain operation inresponse to determining that the determined average write time isgreater than the collision ratio.
 17. The system of claim 13, whereinthe gathered metric information indicates a number of blocks indicatedin the data structure not yet copied to the second storage and a numberof collisions, wherein collisions comprise write requests directed todata blocks indicated in the data structure as not copied to the secondstorage, wherein the performance metric threshold indicates a collisionnumber threshold and a blocks-to-drain threshold, wherein the determinedmetric value comprises a determined blocks-to-drain and determinednumber of collisions, and wherein the operations further comprise:continuing the drain operation in response to determining that thedetermined blocks-to-drain is less than the blocks-to-drain thresholdand the determined number of collisions is less than the collisionnumber threshold.
 18. A method, comprising: generating a data structureindicating data blocks in a first storage to copy to a second storage;initiating a drain operation to copy the data blocks indicated in thefirst storage to the second storage for a drain time period; queuingwrite requests to the data blocks indicated in the data structure duringthe drain time period, wherein the queued write requests are notcompleted while queued; gathering metric information based on the writesthat occur to data blocks in the first storage during the drain timeperiod; and in response to expiration of the drain time period,determining from the gathered metric information whether to continue thedrain operation or terminate the drain operation.
 19. The method ofclaim 18, wherein the determining from the gathered metric informationof whether to continue the drain operation comprises: providing aperformance metric threshold; determining a metric value based on thegathered metric information; and comparing the determined metric valueto the performance metric threshold to determine whether to continue thedrain operation.
 20. The method of claim 19, wherein the performancemetric threshold comprises a first performance metric threshold, whereinthe drain time period comprises a first drain time period, wherein thedetermined metric value comprises a first metric value, wherein thedrain operation is continued for a second drain time period in responseto the continuing of the drain operation, further comprising: continuingto gather metric information during the second drain time period; inresponse to the second drain time period expiring, determining a secondmetric value based on the gathered metric information; and comparing thedetermined second metric value to a second performance metric todetermine whether to continue the drain operation.
 21. The method ofclaim 19, wherein the gathered metric information indicates receivedwrite requests and collisions, wherein collisions comprise writerequests directed to data blocks indicated in the data structure as notcopied to the second storage, wherein the performance metric thresholdindicates a collision ratio threshold, wherein the determined metricvalue comprises a determined collision ratio of a number of collisionsand number of received write requests indicated in the gathered metricinformation, and wherein the drain operation is continued in response todetermining that the determined collision ratio is less than thecollision ratio threshold.
 22. The method of claim 19, wherein thegathered metric information indicates a time to complete received writerequests, wherein the performance metric threshold indicates a writetime threshold, wherein the determined metric value comprises adetermined average write time based on the gathered metric information,further comprising: continuing the drain operation in response todetermining that the determined average write time is less than thecollision ratio; and terminating the drain operation in response todetermining that the determined average write time is greater than thecollision ratio.
 23. The method of claim 19, wherein the gathered metricinformation indicates a number of blocks indicated in the data structurenot yet copied to the second storage and a number of collisions, whereincollisions comprise write requests directed to data blocks indicated inthe data structure as not copied to the second storage, wherein theperformance metric threshold indicates a collision number threshold anda blocks-to-drain threshold, wherein the determined metric valuecomprises a determined blocks-to-drain and determined number ofcollisions, further comprising: continuing the drain operation inresponse to determining that the determined blocks-to-drain is less thanthe blocks-to-drain threshold and the determined number of collisions isless than the collision number threshold.